110
9
Probability and Likelihood
Problem. A protein consists of 300 amino acids, of which it is known that there are
2 cysteines. A 50-mer fragment has been prepared. What are the probabilities that 0,
1, or 2 cysteines are present in the fragment?
9.3.3
The Law of Large Numbers
Consider Bernoulli trials (Sect. 9.2.3). With each trial, the numberbold upper S Subscript nSn increases by 1
(for success) or 0 (for failure), hence
bold upper S Subscript n Baseline equals bold upper X 1 plus midline horizontal ellipsis plus bold upper X Subscript n Baseline commaSn = X1 + · · · + Xn ,
(9.43)
where the random variable bold upper X Subscript kXk equals 1 (with probability pp) if thekkth trial results in
success, otherwise 0 (with probabilityqq);bold upper S Subscript nSn is thus a sum ofnn mutually independent
random variables. The weak law of large numbers states that for largenn, the average
proportion of successes bold upper S Subscript n Baseline divided by nSn/n is likely to be near pp. More generally, if the sequence
StartSet bold upper X Subscript k Baseline EndSet{Xk} has a common, arbitrary distribution, then for every epsilon greater than 0ε > 0 as n right arrow normal infinityn →∞,
upper P left brace StartAbsoluteValue StartFraction bold upper X 1 plus midline horizontal ellipsis plus bold upper X Subscript n Baseline Over n EndFraction minus mu EndAbsoluteValue greater than epsilon right brace right arrow 0 semicolon commaP{|X1 + · · · + Xn
n
−μ| > ε} →0; ,
(9.44)
with the expectation bmuμ exists, and epsilonε is an arbitrarily prescribed small number. For
variable distributions, the law holds for the sequence StartSet bold upper X Subscript k Baseline EndSet{Xk} if for every epsilon greater than 0ε > 0
upper P left brace StartFraction bold upper S Subscript n Baseline minus m Subscript n Baseline Over n EndFraction greater than epsilon right brace right arrow 0 commaP{Sn −mn
n
> ε} →0 ,
(9.45)
where m Subscript nmn is the mean; a sufficient condition for the law to hold is that
StartFraction s Subscript n Baseline Over n EndFraction right arrow 0sn
n →0
(9.46)
where s Subscript n Superscript 2s2
n is the variance of the sum bold upper S Subscript nSn. This does not imply that StartAbsoluteValue bold upper S Subscript n Baseline minus m Subscript n Baseline EndAbsoluteValue divided by n| Sn −mn | /n
remains small for all large nn; it may continue to fluctuate and the law only specifies
that large values of StartAbsoluteValue bold upper S Subscript n Baseline minus m Subscript n Baseline EndAbsoluteValue divided by n| Sn −mn | /n occur infrequently. For an overwhelming proba-
bility that it remains small for all nn, the strong law of large numbers is required. 16
9.3.4
Additive and Multiplicative Processes
Many natural processes are random additive processes; for example, a displacement
is the sum of random steps (to the left or to the right in the case of the one-dimensional
16 See Feller (1967) for details.